Object cutter program

ABSTRACT

An object of the present invention is enabling extraction of objects from a predetermined Web page and linked Web pages led by hyperlinks of the Web page without inputting Web page identifiers corresponding to the linked Web pages. In order to achieve this object, processing means specifies a portion sandwiched by an object start identifier and an object end identifier from display control information and extracts a Web page identifier of a linked Web page from the specified portion based on an extracted portion identifier (S 25 ). The processing means extracts a portion as an object that is sandwiched by an object start identifier and an object end identifier and that satisfies a search condition accepted from input means from display control information of the Web page corresponding to the extracted Web page identifier and stores the portion in storage means (S 35 ).

TECHNICAL FIELD

The present invention relates to a technique of extracting objects fromexisting Web pages and reusing the objects.

BACKGROUND ART

Conventionally, inventions of a Web generating apparatus or the like toextract objects, such as figures and tables, from existing Web pages andgenerate a new Web page have been disclosed (e.g., see Patent Document1).

The Web generating apparatus includes a Web page generating unit toextract objects from a plurality of Web pages and generate a new Webpage in a free layout, a repository management unit to storeconfiguration information of the generated Web page and update pastedobjects, and a Web page executing unit to actually generate a Web pageby using the configuration information and objects stored in therepository management unit.

With this configuration, objects can be extracted from specifiedexisting Web pages, and a new Web page can be generated by placing theextracted objects in a free layout.

However, in the above-described conventional art, when objects are to beextracted from a predetermined Web page and linked Web pages led byhyperlinks of the Web page, each of Web page identifiers correspondingto the linked Web pages needs to be input.

Patent Document 1: Japanese Unexamined Patent Application PublicationNo. 11-250054 DISCLOSURE OF INVENTION Problems to be Solved by theInvention

An object of the present invention is to reduce inconvenience of theabove-described conventional art and enable extraction of objects from apredetermined Web page and linked Web pages led by hyperlinks of the Webpage without inputting Web page identifiers corresponding to the linkedWeb pages.

Means for Solving the Problems

In order to achieve the above-described object, the present inventionadopts the following configuration.

The invention described in Claim 1 is an object cutter program used in aterminal apparatus including information storage means, informationinput means, communication means for communicating with an informationproviding system, and processing means for controlling operations of therespective means. The storage means includes a standard object databasepre-storing object start identifiers each identifying the start of anobject and object end identifiers each identifying the end of an objectin display control information of a Web page provided by the informationproviding system, the object start identifiers and the object endidentifiers being associated with each other. The storage means alsoincludes a Web page identifier extracting condition database pre-storingthe object start identifiers, the object end identifiers, and extractedportion identifiers each being associated with a combination of theobject start and end identifiers and identifying a portion from which aWeb page identifier is to be extracted. The processing means accepts aWeb page identifier to identify a Web page provided by the informationproviding system from the input means. Then, the processing meansreceives display control information of the Web page corresponding tothe accepted Web page identifier from the information providing systemvia the communication means and stores the display control informationin the storage means. Then, the processing means takes the displaycontrol information of the Web page from the storage means and takes anobject start identifier, an object end identifier, and an extractedportion identifier associated with a combination of the object start andend identifiers with reference to the Web page identifier extractingcondition database. Then, the processing means specifies a portionsandwiched by the taken object start identifier and object endidentifier from the taken display control information and extracts a Webpage identifier of a linked Web page from the specified portion based onthe taken extracted portion identifier. Then, the processing meansreceives display control information of the Web page corresponding tothe extracted Web page identifier from the information providing systemvia the communication means and stores the display control informationin the storage means. Then, the processing means takes an object startidentifier and an object end identifier associated with the object startidentifier with reference to the standard object database. Then, theprocessing means extracts a portion as an object sandwiched by the takenobject start identifier and object end identifier from the displaycontrol information stored in the storage means, and stores the portionin the storage means.

Here, the object is part of the display control information of a Webpage and is a minimum unit constituting a substance displayed on theWeb. page. Examples of the object include a figure object represented byan <img> tag, a table object represented by a <table> tag, and a textobject represented by an <a> tag and having a hyperlink.

The invention described in Claim 2 is an object cutter program used in aterminal apparatus including information storage means, informationinput means, communication means for communicating with an informationproviding system, and processing means for controlling operations of therespective means. The storage means includes a standard object databasepre-storing object start identifiers each identifying the start of anobject and object end identifiers each identifying the end of an objectin display control information of a Web page provided by the informationproviding system, the object start identifiers and the object endidentifiers being associated with each other. The storage means alsoincludes a Web page identifier extracting condition database pre-storingthe object start identifiers, the object end identifiers, and extractedportion identifiers each being associated with a combination of theobject start and end identifiers and identifying a portion from which aWeb page identifier is to be extracted. The processing means accepts aWeb page identifier to identify a Web page provided by the informationproviding system and a search condition from the input means. Then, theprocessing means receives display control information of the Web pagecorresponding to the accepted Web page identifier from the informationproviding system via the communication means and stores the displaycontrol information in the storage means. Then, the processing meanstakes the display control information of the Web page from the storagemeans and takes an object start identifier, an object end identifier,and an extracted portion identifier associated with a combination of theobject start and end identifiers with reference to the Web pageidentifier extracting condition database. Then, the processing meansspecifies a portion sandwiched by the taken object start identifier andobject end identifier from the taken display control information andextracts a Web page identifier of a linked Web page from the specifiedportion based on the taken extracted portion identifier. Then, theprocessing means receives display control information of the Web pagecorresponding to the extracted Web page identifier from the informationproviding system via the communication means and stores the displaycontrol information in the storage means. Then, the processing meanstakes an object start identifier and an object end identifier associatedwith the object start identifier with reference to the standard objectdatabase. Then, the processing means extracts a portion as an objectthat is sandwiched by the taken object start identifier and object endidentifier and that satisfies the accepted search condition from thedisplay control information stored in the storage means, and stores theportion in the storage means.

Advantages

With the above-described configuration, the processing means extracts aWeb page identifier of a linked Web page from the display controlinformation of the Web page corresponding to the Web page identifieraccepted from the input means and receives the display controlinformation of the Web page corresponding to the extracted Web pageidentifier from the information providing system. Accordingly, anonconventional excellent object cutter program capable of extractingobjects from a predetermined Web page and linked Web pages led byhyperlinks of the Web page without inputting Web page identifierscorresponding to the linked Web pages can be provided.

BEST MODE FOR CARRYING OUT THE INVENTION

Hereinafter, an embodiment of the present invention is described withreference to the drawings.

FIG. 1 is a block diagram showing an entire configuration of a systemaccording to this embodiment. A terminal apparatus 100 and Web servers200 serving as information providing systems connect to the Internet300. Each of the Web servers 200 transmits display control information,such as an HTML (Hyper Text Markup Language) file, to the terminalapparatus 100 based on a request from the terminal apparatus 100. Theterminal apparatus 100 extracts objects, such as figures and tables,from the display control information received from the Web server 200.Here, the Web server 200 may connect to the Internet 300 so that a thirdparty other than a user of the terminal apparatus 100 can providecontent and the like. In this embodiment, the plurality of Web servers200 connect to the Internet 300. These Web servers 200 have a typicalconfiguration including processing means, storage means, andcommunication means.

FIG. 2 shows a configuration of a typical PC (personal computer) servingas the terminal apparatus 100 to which an object cutter program 110 ofthe present application is applied. A keyboard 106 and a mouse 107serving as input means; a display 108 serving as display means; a CPU102 serving as processing means; a RAM 103, a ROM 104, and an HDD (harddisk drive) 109 serving as storage means; and an NIC (network interfacecard) 105 serving as communication means connect to a bus 101. An I/Frepresents an interface between the bus 101 and various devices. The HDD109 stores the object cutter program 110, a standard object database111, a Web page identifier extracting condition database 112, and so on.The CPU 102 of the terminal apparatus 100 reads the object cutterprogram 110 stored in the HDD 109 to the RAM 103 and executes it, so asto provide a function of extracting an object from display controlinformation received from the Web server 200 with reference to thestandard object database 111. Also, the CPU 102 of the terminalapparatus 100 extracts a Web page identifier of a linked Web page fromdisplay control information received from the Web server 200 withreference to the Web page identifier extracting condition database 112.

FIG. 3 shows a structure of the standard object database 111 stored inthe HDD 109 of the terminal apparatus 100. In this embodiment, objectstart identifiers and object end identifiers are pre-stored in thestandard object database 111 while being associated with each other.Here, the object start identifier identifies the start of an object indisplay control information. The object end identifier identifies theend of an object in display control information. For example, when theobject is a table, the object start identifier is “<table” while theobject end identifier is “</table>”.

FIG. 4 shows a structure of the Web page identifier extracting conditiondatabase 112 stored in the HDD 109 of the terminal apparatus 100 In thisembodiment, object start identifiers, object end identifiers, andextracted portion identifiers each associated with a combination ofobject start and end identifiers are pre-stored in the Web pageidentifier extracting condition database 112. Here, the object startidentifiers and object end identifiers are the same as those stored inthe standard object database 111. The extracted portion identifierspecifies a portion to be extracted when a Web page identifier of alinked Web page is to be extracted from an object specified by an objectstart identifier and an object end identifier. For example, when theobject start identifier, the object end identifier, and the extractedportion identifier are “<a”, “</a>”, and “src=”, respectively, theportion that is sandwiched by “<a” and “</a>” and that is describedimmediately after “src=” in the display control information is the Webpage identifier of the linked Web page.

Now, an operation of the terminal apparatus 100 according to thisembodiment is described.

FIG. 5 is a flowchart showing a process performed by the CPU 102 of theterminal apparatus 100 by reading the object cutter program 110 to theRAM 103 and executing it.

First, the CPU 102 of the terminal apparatus 100 displays a screen usedfor inputting a URL (uniform resource locator) as a Web page identifierand one or more keywords as a search condition on the display 108 (S10).FIG. 6 shows an example of an input screen 600. The input screen 600 isprovided with a URL input field 601 for inputting a URL of a Web pagefrom which an object is to be extracted, a keyword input field 602 forinputting one or more keywords of the object to be extracted, and an OKbutton 603. On this screen, a user inputs a URL and one or more keywordsto the URL input field 601 and the keyword input field 602,respectively, by using the keyboard 106.

Upon press of the OK button 603 by the mouse 107, the CPU 102 of theterminal apparatus 100 accepts the URL of a Web page input to the URLinput field 601 and the keyword(s) input to the keyword input field 602(S15).

Then, the CPU 102 of the terminal apparatus 100 transmits a request forobtaining the Web page to the Web server 200 based on the accepted URLof the Web page. The processing means of the Web server 200 transmitsdisplay control information of the requested Web page to the terminalapparatus 100 based on the received request for obtaining the Web page.FIG. 7 shows an example of an HTML file as the display controlinformation received by the terminal apparatus 100. The HTML file 700includes “<a href=“http://xxx/sub-page1.htm”>SUB-PAGE1</a>” 701, whichis a text object having a hyperlink, and “<img src=“picture1.gif”>”.702, which is a figure object. The CPU 102 of the terminal apparatus 100stores the received HTML file 700 as display control information in theHDD 109 (S20).

Then, the CPU 102 of the terminal apparatus 100 extracts the URL of alinked Web page from the HTML file 700 stored in the HDD 109 in thefollowing manner.

The CPU 102 of the terminal apparatus 100 takes an object startidentifier, an object end identifier, and an extracted portionidentifier associated with a combination of the object start and endidentifiers, with reference to the Web page identifier extractingcondition database 112. Then, the CPU 102 of the terminal apparatus 100extracts, from the HTML file 700 stored in the HDD 109, the portion thatis sandwiched by the taken object start identifier and object endidentifier and that is described immediately after the extracted portionidentifier, as the URL of the linked Web page (S25).

For example, in the case of the HTML file 700 shown in FIG. 7, the CPU102 of the terminal apparatus 100 extracts the URL of the linked Webpage in the following manner. The CPU 102 of the terminal apparatus 100takes “<a” as the object start identifier, “</a>” as the object endidentifier, and “src=” as the extracted portion identifier associatedwith a combination of the object start and end identifiers, withreference to the Web page identifier extracting condition database 112.Then, the CPU 102 of the terminal apparatus 100 extracts, from the HTMLfile 700 shown in FIG. 7, the portion that is sandwiched by “<a” and“</a>” and that is described immediately after “src=”:“http://xxx/sub-page1.htm”, as the URL of the linked Web page.

Then, the CPU 102 of the terminal apparatus 100 transmits a request forobtaining the linked Web page to the Web server 200 based on theextracted URL. Then, the processing means of the Web server 200transmits the HTML file of the requested Web page to the terminalapparatus 100 based on the received request for obtaining the linked Webpage. FIG. 8 shows an example of the HTML file 800 of the linked Webpage received by the terminal apparatus 100. The HTML file 800 includes“<img src=“picture2.gif”>”, which is a figure object. Then, the CPU 102of the terminal apparatus 100 stores the received HTML file 800 asdisplay control information in the HDD 109 (S30).

Accordingly, the CPU 102 of the terminal apparatus 100 stores the HTMLfile 700 of the Web page corresponding to the URL input on the inputscreen 600 through the keyboard 106 and the HTML file 800 of the linkedWeb page led by a hyperlink of the Web page in the HDD 109. In thisembodiment, the CPU 102 of the terminal apparatus 100 receives only thedisplay control information of the Web page corresponding to the URLinput through the keyboard 106 and the display control information ofthe linked Web page led by a hyperlink of the Web page from the Webserver 200 and stores the information in the HDD 109. Alternatively, theCPU 102 may extract a Web page identifier of a Web page further led by ahyperlink of the linked Web page, receive display control information ofthe Web page corresponding to the Web page identifier from the Webserver 200, and store the information in the HDD 109 in theabove-described method.

Then, the CPU 102 of the terminal apparatus 100 takes an object startidentifier and an object end identifier associated with the object startidentifier with reference to the standard object database 111. Then, theCPU 102 of the terminal apparatus 100 extracts, from the HTML filestored in the HDD 109, the portion that is sandwiched by the takenobject start identifier and object end identifier and that includes thekeyword(s) input on the input screen 600 through the keyboard 106, as anobject satisfying the search condition (S35). Then, the CPU 102 of theterminal apparatus 100 stores the extracted object in the storage means.

For example, when the keyword input to the keyword input field 602 onthe input screen 600 is “gif”, the CPU 102 of the terminal apparatus 100extracts <img src=“picture1.gif”> and <img src=“picture2.gif”> asobjects satisfying the search condition from the HTML files shown inFIGS. 7 and B stored in the HDD 109 and stores the objects in the HDD109. FIG. 9 shows an example of a state where the CPU 102 of theterminal apparatus 100 stores the objects satisfying the searchcondition in the HDD 109.

With the above-described process, when objects are to be extracted fromdisplay control information of a predetermined Web page and linked Webpages led by hyperlinks of the Web page, the objects can be extractedwithout inputting Web page identifiers corresponding to the linked Webpages.

The objects stored in the HDD 109 can be used in the following way.

The CPU 102 of the terminal apparatus 100 displays substances fordisplay, such as a figure and a table, corresponding to the storedobjects and buttons associated with the objects on the display 108. Uponpress of one of the buttons by the mouse 107, the CPU 102 of theterminal apparatus 100 adds the object associated with the pressedbutton to the display control information of a new Web page, so that thenew Web page can be generated.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram showing an entire configuration of a system.

FIG. 2 is a block diagram showing a configuration of a terminalapparatus.

FIG. 3 shows a structure of a standard object database.

FIG. 4 shows a structure of a Web page identifier extracting conditiondatabase.

FIG. 5 is a flowchart showing a process in the terminal apparatus.

FIG. 6 shows an example of an input screen.

FIG. 7 shows an example of an HTML file (display control information).

FIG. 8 shows an example of an HTML file (display control information) ofa linked Web page.

FIG. 9 shows an example of objects satisfying a search condition.

1. An object cutter program used in a terminal apparatus includinginformation storage means, information input means, communication meansfor communicating with an information providing system, and processingmeans for controlling operations of the respective means, the storagemeans including: a standard object database pre-storing object startidentifiers each identifying the start of an object and object endidentifiers each identifying the end of an object in display controlinformation of a Web page provided by the information providing system,the object start identifiers and the object end identifiers beingassociated with each other; and a Web page identifier extractingcondition database pre-storing the object start identifiers, the objectend identifiers, and extracted portion identifiers each being associatedwith a combination of the object start and end identifiers andidentifying a portion from which a Web page identifier is to beextracted, and the processing means being allowed to execute: a) a stepof accepting a Web page identifier to identify a Web page provided bythe information providing system from the input means; b) a step ofreceiving display control information of the Web page corresponding tothe accepted Web page identifier from the information providing systemvia the communication means and storing the display control informationin the storage means; c) a step of taking the display controlinformation of the Web page from the storage means and taking an objectstart identifier, an object end identifier, and an extracted portionidentifier associated with a combination of the object start and endidentifiers with reference to the Web page identifier extractingcondition database; d) a step of specifying a portion sandwiched by thetaken object start identifier and object end identifier from the takendisplay control information and extracting a Web page identifier of alinked Web page from the specified portion based on the taken extractedportion identifier; e) a step of receiving display control informationof the Web page corresponding to the extracted Web page identifier fromthe information providing system via the communication means and storingthe display control information in the storage means; f) a step oftaking an object start identifier and an object end identifierassociated with the object start identifier with reference to thestandard object database; and g) a step of extracting a portion as anobject sandwiched by the taken object start identifier and object endidentifier from the display control information stored in the step b andthe step e, and storing the portion in the storage means.
 2. An objectcutter program used in a terminal apparatus including informationstorage means, information input means, communication means forcommunicating with an information providing system, and processing meansfor controlling operations of the respective means, the storage meansincluding: a standard object database pre-storing object startidentifiers each identifying the start of an object and object endidentifiers each identifying the end of an object in display controlinformation of a Web page provided by the information providing system,the object start identifiers and the object end identifiers beingassociated with each other; and a Web page identifier extractingcondition database pre-storing the object start identifiers, the objectend identifiers, and extracted portion identifiers each being associatedwith a combination of the object start and end identifiers andidentifying a portion from which a Web page identifier is to beextracted, and the processing means being allowed to execute: a) a stepof accepting a Web page identifier to identify a Web page provided bythe information providing system and a search condition from the inputmeans; b) a step of receiving display control information of the Webpage corresponding to the accepted Web page identifier from theinformation providing system via the communication means and storing thedisplay control information in the storage means; c) a step of takingthe display control information of the Web page from the storage meansand taking an object start identifier, an object end identifier, and anextracted portion identifier associated with a combination of the objectstart and end identifiers with reference to the Web page identifierextracting condition database; d) a step of specifying a portionsandwiched by the taken object start identifier and object endidentifier from the taken display control information and extracting aWeb page identifier of a linked Web page from the specified portionbased on the taken extracted portion identifier; e) a step of receivingdisplay control information of the Web page corresponding to theextracted Web page identifier from the information providing system viathe communication means and storing the display control information inthe storage means; f) a step of taking an object start identifier and anobject end identifier associated with the object start identifier withreference to the standard object database; and g) a step of extracting aportion as an object that is sandwiched by the taken object startidentifier and object end identifier and that satisfies the acceptedsearch condition from the display control information stored in the stepb and the step e, and storing the portion in the storage means.